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ABSTRACT 

Mitochondrial DNA (mtDNA) rearrangements are key 
events in the development of many diseases. 
Investigations of mtDNA regions affected by re- 
arrangements (i.e. breakpoints) can lead to import- 
ant discoveries about rearrangement mechanisms 
and can offer important clues about the causes of 
mitochondrial diseases. Here, we present the mito- 
chondrial DNA breakpoints database (MitoBreak; 
http://mitobreak.portugene.com), a free, web-ac- 
cessible comprehensive list of breakpoints from 
three classes of somatic mtDNA rearrangements: 
circular deleted (deletions), circular partially dupli- 
cated (duplications) and linear mtDNAs. Currently, 
MitoBreak contains >1400 mtDNA rearrangements 
from seven species (Homo sapiens, Mus musculus, 
Rattus norvegicus, Macaca mulatta, Drosophila 
melanogaster, Caenorhabditis elegans and 
Podospora anserina) and their associated pheno- 
typic information collected from nearly 400 publica- 
tions. The database allows researchers to perform 
multiple types of data analyses through user- 
friendly interfaces with full or partial datasets. It 
also permits the download of curated data and the 
submission of new mtDNA rearrangements. For 
each reported case, MitoBreak also documents the 
precise breakpoint positions, junction sequences, 
disease or associated symptoms and links to the 
related publications, providing a useful resource to 
study the causes and consequences of mtDNA 
structural alterations. 



INTRODUCTION 

A genomic rearrangement is a large scale modification of 
the genome caused by a deletion, inversion, duplication. 



insertion or translocation (1,2). The genomic region where 
a junction between normal and rearranged DNA is 
detected is called breakpoint and can be located by 
comparing the sequences of the rearranged genomes. 
Some genomic regions, known as breakpoint hotspots or 
fragile sites (3,4), appear to be intrinsically prone to 
breakage and reorganization (5). The molecular character- 
ization of these unstable genomic regions is a prerequisite 
for a better comprehension of the mechanisms underlying 
rearrangements. 

Mitochondrial DNA (mtDNA) exhibits extraordinary 
genetic and physical diversity across different eukaryotic 
Hneages (6) and among different cell and tissue types (7). 
Despite the notable variations in structural organization, 
the mitochondrial genome always encodes a small number 
of proteins essential to the oxidative phosphorylation 
complex (2,8). Some mutational changes in the mitochon- 
drial genome significantly affect cellular respiration, 
causing a cUnically heterogeneous group of disorders 
related to oxidative phosphorylation dysfunction known 
as mitochondrial diseases. The most common types of 
genetic alterations in mtDNA are point mutations, 
partial deletions and tandem direct duphcations (9,10). 

A circular deleted mtDNA (also known as 'mtDNA 
deletion'. Figure lA) is a short mtDNA molecule that 
lacks a section of the genome but remains in a circular 
format. Deleted mtDNA molecules have been found to be 
associated with mitochondrial disease such as progressive 
external ophthalmoplegia (PEO), Kearns-Sayre syndrome 
(KSS), Pearson syndrome (PS) and mitochondrial 
neurogastrointestinal encephalomyopathy (MNGIE), 
among other diseases. Furthermore, the accumulation of 
deleted mtDNA molecules might contribute to the aging 
process (11), Parkinson's disease (12) or inclusion body 
myositis (13). 

A circular partially duplicated mtDNA (also known as 
'mtDNA duplication'. Figure IB) is a larger than normal 
mtDNA molecule because of the presence of a region that 
was tandemly duplicated. The pathogenic nature of duph- 
cations remains uncertain, although they have been found 
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Figure 1. Schematic representation of a circular deleted (A), circular partially duplicated (B), full-length linear (C) and short linear (D) mtDNA 
molecule. The first (5') and second (3') breakpoints in the reference mtDNA L-strand define an mtDNA deletion or duplication. The proximal 
retained regions of the 5' breakpoint (5fl; 5' flanking; dark blue section) and 3' breakpoint (3fl; 3' flanking; dark red section) are found in mtDNA 
deletions and duplications (occurring twice in duplicated mtDNA). The proximal deleted regions of the 5' breakpoint (5del; 5' deleted; light blue 
section) and 3' breakpoint (3del; 3' deleted; light red section) are removed in mtDNA deletions. Full-length linear mtDNA is defined by a single 
breakpoint and has no deleted sections. Shorter linear mtDNAs are defined by two breakpoints. 



to be associated with KSS, PEO, PS and mitochondrial 
myopathies, as well as with aged tissues (14,15). Deleted 
and duplicated mtDNAs can coexist in the same individ- 
ual, with the non-deleted mtDNA region being the tandem 
duplicated region in partial duplications (16,17). 

A third class of breakpoints is observed in hnear 
mtDNA molecules, which can result from one or more 
double-strand breaks that open the circular conformation. 
In this case, a single break produces a full-length hnear 
mtDNA (Figure IC), and two or more breaks result in 
short hnear mtDNAs (Figure ID). Short hnear mtDNA 
molecules are common in the mtDNA mutator mice that 
express a proofreading-deficient mtDNA polymerase 
(18,19). It has also been suggested that hnear mtDNAs 
caused by double-strand breaks are associated with the 
formation of mtDNA deletions (20). 

The investigation of mtDNA rearrangements has been 
largely confined to the study of human diseases. The 
absence of a comprehensive database of breakpoints has 
restricted the discovery of general DNA sequence features 
in mtDNA rearrangements. Although some useful 
mtDNA databases are available [e.g. MITOMAP (21) 
and MitoTool (22)], a comprehensive catalogue of the dif- 
ferent classes of mtDNA rearrangements from different 
species is still lacking. Therefore, we have developed the 
mitochondrial DNA breakpoints (MitoBreak) database, a 
manually curated database containing mitochondrial 
DNA breakpoints and their associated chnical and pheno- 
typic information. The MitoBreak database integrates the 



existing information on mtDNA breakpoints from differ- 
ent species with user-friendly visualization and analysis 
tools. The goal is to collect and maintain all relevant 
data on mtDNA breakpoints and present it in an easily 
accessible format. 

BASIC CONCEPTS 

The MitoBreak database describes breakpoints from three 
types of iTitDNA rearrangeinents: circular deleted, circular 
partially-duplicated and linear mtDNAs (Figure 1). The 
mtDNA deletions and duplications are usually defined by 
a combination of two reference numbers that identify the 
location of the 5' and 3' breakpoints. The 5' breakpoint is 
positioned to the left of the 5' break and the 3' breakpoint 
is positioned to the right of the 3' break, ineaning that 
both breakpoints are retained in the deleted mtDNAs. 
We used the terms '5fl' to describe the upstreain flanking 
regions of the 5' breakpoints and '3fl' to describe the 
downstream regions of the 3' breakpoints (i.e. the 
proximal retained sequences). These 5fl and 3fl regions 
are next to each other in deleted mtDNAs (Figure lA). 
The DNA segment that is removed in a mtDNA deletion 
is flanked by the downstream region of the 5' breakpoint 
(5del) and the upstream region of the 3' breakpoint (3del), 
i.e. the proximal deleted sequences. Duplications have a 
junction site connecting the 5fl and 3fl, which resembles 
the junction site of mtDNA deletions (Figure IB). The 
fufl-length linear mtDNA is defined by a single breakpoint 
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position (Figure IC), while short Hnear mtDNAs can be 
defined by two breakpoints (Figure ID). The breakpoints 
of hnear mtDNA are difficult to determine and have been 
rarely reported in hterature. Currently, MitoBreak only 
describes the short hnear mtDNAs found in the mtDNA 
mutator mice (19). 

The 5' and 3' breakpoints are often located within 
perfect direct repeats, i.e. identical DNA sequences 
found in different locations of the mitochondrial genome 
(e.g. ACCTCCCTCACC; 8470-8482/13 447-13 459) 
(Figure 2A and B). Therefore, the junction sequence of 
both deletions and duphcations may retain 0, 1 or 2 
copies of the direct repeats (Figure 2C). When the re- 
arrangement retains a single copy of the direct repeat, it 
is impossible to identify the exact sites of the breakage 
events. For this reason, an mtDNA rearrangement may 
be described by an interval of values (e.g. 8469- 
8482:13447-13460) or by arbitrary positions inside the 
repeated region. To avoid including redundant data in 
our datasets, the location of the breakpoint, when in the 
presence of homology, was standardized by always placing 
both breakpoints downstream (on the right) of the direct 
repeats (in the previous example, 8482:13460; Figure 2B), 
as previously described (23). 

In the case of human mtDNA, we considered the origins 
of H-strand (Oh) and L-strand (Ol) replications accord- 
ing to the strand-displacement model of mtDNA replica- 
tion (24-27). Although each replication origin is defined 
by a range of nucleotides, we used the nucleotides pos- 
itions 407 for Oh and 5747 for Ol (28-30), which 
included the mtDNA regions encoding the RNA 



fragment used as primers to initiate DNA synthesis. In 
this way, the minor arc (shortest mtDNA section 
between Ol and Oh) is located between positions 408 
and 5746, while the major arc (largest mtDNA section 
between Ol and Oh) is defined by positions 5747^07. 



DATA CURATION AND COLLECTION 

Currently, the MitoBreak database describes 1472 mtDNA 
rearrangements from seven species: Homo sapiens, Mus 
musciilus, Rattus norvegicus, Macaca mulatta, Drosophila 
melanogaster , Caenorhabditis elegans and Podospora 
anserina (Table 1). The database was constructed using 
information from 388 peer-reviewed papers and 2 PhD 
theses published from 1983 to 2013 (Table 1), as well as 
information gathered from the MITOMAP and MitoTool 
databases. The information associated with each mtDNA 
rearrangement described in MitoBreak was manually 
curated. We started by comparing and numbering the 
reported 5' and 3' breakpoints or junction sequences ac- 
cording to the reference mtDNA sequence (Table 1). We 
only considered rearrangements from which the break- 
point positions have been confirmed by sequencing 
analysis. The numbering of the breakpoints located 
within direct repeats was also corrected according to our 
standardization procedure (Figure 2). The datasets 
comprise non-redundant data, i.e. only different combin- 
ations of 5' and 3' breakpoints are represented. 
Nevertheless, all references associated with each 
rearranged mtDNA are described. 
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Figure 2. The positioning of tlie breakpoints in the presence of direct repeats. (A) The precise location of the 5' and 3' breakpoints is unknown for 
several reported mtDNA rearrangements because of the presence of direct repeats (underlined, bold letters). (B) When only one copy of the direct 
repeat is maintained, the 5' and 3' breakpoints were placed downstream (on the right) of the homology, according to the L-strand numbering. (C) 
Representation of the possible number of direct repeats (DR) retained in the rearrangement junction of mtDNA deletions and duplications. When 
the breakpoints are located in a DR, the resulting junction sequence can retain 0, 1 or 2 copies of the repeated motif (green box). 



Rearrangement type Rearrangements («) Number of publications Reference mtDNA sequence 
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Table 1. Summary of mtDNA rearrangements in MitoBreak 



Species 



H. sapiens Deletions 805 

Duplications 44 

M. musculus Deletions 245 

Linear 31 

R. norvegicus Deletions 216 

M. mulatta Deletions 58 

D. melanogaster Deletions 35 

P. anserina Deletions 32 

C. elegans Deletions 6 



We then retrieved the phenotypic information 
associated with each mtDNA rearrangement from each 
peer-reviewed pubhcation. In the case of human mtDNA 
deletions and duphcations, we also organized the reported 
cases into seven major categories based on the number and 
characteristics of the clinical or plienotypical features: 
single mtDNA deletions, multiple mtDNA deletions, 
healthy tissues, Parkinson's disease, inclusion body myo- 
sitis, tumour and other chnical features (31). The mtDNA 
rearrangements from M. musculus and R. norvegicus were 
organized according to the strain. A section with unpub- 
hshed datasets is also available for those rearrangements 
not described in a peer-reviewed publication. 



DATABASE ORGANIZATION 

The MitoBreak database is constituted of three major 
components: datasets, the classifier tool and the submit 
tool. The users can access any of these components 
through the top and bottom navigation bars of all 
pages. The datasets section is subdivided into two major 
sections: general statistics and individual rearrangement 
page. The classifier tool allows users to standardize the 
positions of the rearrangement breakpoints when they 
are located in direct repeats and to verify if the rearrange- 
ment is already present in the database. The submit tool 
allows users to easily submit a new mtDNA rearrange- 
ment to MitoBreak or to submit supplementary data to 
an existing case. 

Datasets 

The datasets section is accessible through a table 
describing the species, type of rearrangement, number of 
reported cases and the number of publications used to 
build the dataset. After selecting a particular rearrange- 
ment type, users will be directed to an interactive table 
hsting all cases, including the name and location of the 
rearrangement, relevant chnical features, references, etc. 
The contents of this table can be rearranged in multiple 
ways by multi-column sorting, scrolling options for table 
viewport and user defined searches. Additionally, the 
dataset can be filtered by categories and subcategories 
using the filter boxes on the top of the page. The full or 
filtered dataset and the flanking regions can then be 
copied, printed or downloaded in comma separated 
values or excel format. The 'All references' button opens 
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a full list of publications used to build the selected dataset, 
with a connection to the PUBMED website (http://www. 
ncbi.nlm.nih.gov/pubmed). Users also have the opportun- 
ity to reach the individual page of each rearrangement by 
clicking on its name in the first column of the table. The 
'General Statistics' button opens a series of descriptive 
analyses, performed dynamically, over the full or filtered 
dataset. 



General statistics 

The general statistics section provides an overview of the 
information present in a MitoBreak dataset (full or filtered 
by the user). The displayed statistics vary according to the 
type of rearrangement, the species and the available data. 
For instance, the general statistics for human mtDNA de- 
letions include the breakpoint distributions, deletion 
lengths, genomic locations of the deleted regions, distribu- 
tion in sub-groups and a circular visualization of the 
mtDNA with all deletions. The general statistics are pre- 
sented using tables, interactive histograms and/or 
scatterplots (Figure 3). The charts are based on the 
Highcharts tool (http://www.highcharts.com/) where 
users can visualize the raw data of each chart point on 
mouse-over, zoom each axis, select which series will be 
presented by clicking on the legend and export the chart 
in png, jpeg, pdf or svg format. 

Individual page 

The individual page of each rearrangement can be 
accessed through the first column of the dataset table, 
which displays the breakpoints. In the case of deletions 
and duplications, users can visualize the flanking se- 
quences of the breakpoints and the length and location 
of the direct repeats (if present). The locations of the 5' 
and 3' breakpoints are also shown in a scatterplot (red 
dot), while the rearrangement length is shown in a histo- 
gram (red bar). These two charts also display the data for 
all available breakpoints in the dataset in grey dots or 
bars; therefore, the individual rearrangement can be 
analyzed in the broader context of all reported cases. 
The rearrangement is also represented in a circular 
mtDNA plot. The phenotype groups that include the 
mtDNA rearrangement and the references are also made 
available on the individual page (Figure 4). 
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Figure 3. The general statistics section of MitoBreak for a dataset witli mtDNA deletions. This section provides a series of descriptive analyses such 
as (A) deletion lengths, (B) breakpoint distributions, (C) analysis by group, (D) locations of the deleted regions and (E) mtDNA circular visualiza- 
tion. The button opening each analysis is shown in grey. 



Classifier tool 

The classifier tool allows users to compare any mtDNA 
rearrangement obtained during their research with the full 



dataset of published rearrangements present in MitoBreak. 
To compare a rearrangement with the complete dataset, the 
Classifier tool also corrects the position of the breakpoints 
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Figure 4. The individual page section of MitoBreak for a mtDNA deletion. This section describes for each rearrangement: (A) general characteristics 
of the selected rearrangement, (B) clinical or phenotypical groups where the rearrangement was found, (C) flanking sequences with highlighted direct 
repeats, (D) location of the 5' and 3' breakpoints in a 2D plot, (E) circular representation of the rearranged mtDNA region, (F) rearrangement length 
in comparison with the full dataset and (G) references. 



when located in direct repeats (Figure 2). Tlie breakpoints 
can be submitted in two formats: (i) a pair of breakpoint 
numbers as identified by tlie user in the reference mtDNA 
sequence or (ii) the rearrangement junction sequence, i.e. 
where the normal mtDNA is disrupted by rearrangement. 



If a sequence junction is provided by the user, a Blast (32) 
analysis between the given sequence and the reference 
mtDNA genome is performed. The classification tool 
provides the same information available for the individual 
pages described previously, but for the new submitted 
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rearrangement, as well as an indication of the number of 
cases already present in MitoBreak with the same 
breakpoints. 

Submit tool 

The submit tool permits the user to submit new mtDNA 
rearrangements to MitoBreak. The procedure is similar to 
that used in the classifier tool. After the indication of the 
breakpoints or the rearrangement junction, the main char- 
acteristics of the rearrangement are shown for confirm- 
ation. The submission procedure ends with a contact 
form. 

SUPPORT 

We provide relevant information on mtDNA rearrange- 
ments (numbering and location of breakpoints, definition 
of flanking regions, breakpoints on direct repeats, abbre- 
viations, etc.) in the Documentation section and detailed 
tutorials on how to use the MitoBreak tools can be found 
in the Tutorial section, such as 'How to analyze a set of 
breakpoints already present in the database', 'How to 
locate a pair of breakpoints within the existing datasets' 
and 'How to submit new breakpoints to the database'. 
User support can be obtained through the contact form 
available on the Contact & Support page or via email at 
mitobreak@gmail.com. 

AVAILABILITY AND DESIGN 

The MitoBreak database is available at http://mitobreak. 
portugene.com. It uses a SQLite database for data storage 
and runs on an Apache web server using CGI-Perl and 
JavaScript to generate dynamic HTML pages. The dataset 
tables were generated using the JQuery plugin DataTables 
vl.9.4 (http://datatables.net/). The interactive graphs were 
created using Highcharts v3.0 (http://www.highcharts. 
com/), and the circular mtDNA representations were 
made using Circos vO.64 (33). 

CONCLUSIONS AND PERSPECTIVES 

Our goal is to continually add new mtDNA rearrange- 
ments, both inside the existing datasets or new ones, as 
well as enhance the available analyses and visualization 
methods. We encourage the submission of new data on 
rearrangements to MitoBreak to be shared with the 
research community. If users have large collections of re- 
arrangements to submit or want to add breakpoints from 
a species and/or rearrangement type not present in 
MitoBreak, a contact form and e-mail address is available 
on the website. 

Several online resources are available that describe the 
diverse features of mitochondria and mitochondrial DNA 
[e.g. MITOMAP (21), MitoTool (22), mtDB (34), HmtDB 
(35), MitoP2 (36)], but only the MITOMAP and 
MitoTool databases have information regarding mtDNA 
rearrangements. However, these two databases only 
describe mtDNA rearrangements detected in humans 
without any statistical, descriptive or visual representation 



of the breakpoints. The MitoBreak database is by far the 
largest set of mtDNA breakpoints currently avaflable and 
presents diverse information for each available rearrange- 
ment. Moreover, MitoBreak allows multiple types of 
interactions with the datasets so users can have a fast 
characterization of all or subsets of mtDNA rearrange- 
ments. New rearrangements can be easily analyzed in 
the light of available data, including their previous de- 
scription in a chnical context. For the first time, mtDNA 
breakpoints from different species can be analyzed using a 
single platform. The comparison of mtDNA rearrange- 
ments from different species facilitates the identification 
of common sequence features in breakpoint regions, 
such as direct repeats or non-B DNA conformations. 
This information might help to dehneate new experiments 
in model organisms (M. musculus, D. melanogaster, C. 
elegans, etc.) to better understand how pathological 
mtDNA deletions and duplications are formed in humans. 

For all these reasons, we beheve that MitoBreak wiU be 
a useful tool to help researchers gain greater knowledge 
about mtDNA rearrangements. It will provide clinicians 
and molecular geneticists a useful resource to study new or 
previously described mtDNA deletions, which are 
associated with a wide variety of highly debilitating and 
often fatal disorders and have been implicated in aging 
and age-associated disease. MitoBreak will also be useful 
for researchers interested in designing accurate methods 
for the identification and screening of abnormal 
mtDNAs in different chnical contexts by providing 
detailed information on deleted/duplicated mtDNA 
regions. Finally, our database is an easily accessible 
platform for those who might want to explore the basics 
of mtDNA organization and the general mechanisms of 
genomic rearrangements across species. 
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